Query Execution on a Replicated and Partitioned Database

نویسندگان

  • Neha Narula
  • Robert T. Morris
  • Terry P. Orlando
چکیده

Web application developers partition and replicate their data amongst a set of SQL databases to achieve higher throughput. Given multiple copies of tables partioned different ways, developers must manually select different replicas in their application code. This work presents Dixie, a query planner and executor which automatically executes queries over replicas of partitioned data stored in a set of relational databases, and optimizes for high throughput. The challenge in choosing a good query plan lies in predicting query cost, which Dixie does by balancing row retrieval costs with the overhead of contacting many servers to execute a query. For web workloads, per-query overhead in the servers is a large part of the overall cost of execution. Dixie's cost calculation tends to minimize the number of servers used to satisfy a query, which is essential for minimizing this query overhead and obtaining high throughput; this is in direct contrast to optimizers over large data sets that try to maximize parallelism by parallelizing the execution of a query over all the servers. Dixie automatically takes advantage of the addition or removal of replicas without requiring changes in the application code. We show that Dixie sometimes chooses plans that existing parallel database query optimizers might not consider. For certain queries, Dixie chooses a plan that gives a 2.3x improvement in overall system throughput over a plan which does not take into account perserver query overhead costs. Using table replicas, Dixie provides a throughput improvement of 35% over a naive execution without replicas on an artificial workload generated by Pinax, an open source social web site. Thesis Supervisor: Robert T. Morris Title: Professor

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generating Efficient Execution Plans for Vertically Partitioned XML Databases

Experience with relational systems has shown that distribution is an effective way of improving the scalability of query evaluation. In this paper, we show how distributed query evaluation can be performed in a vertically partitioned XML database system. We propose a novel technique for constructing distributed execution plans that is independent of local query evaluation strategies. We then pr...

متن کامل

A Hash Partition Strategy for Distributed Query Processing

This paper describes a hash partitioning strategy for distributed query processing in a multi-database environment in which relations are unfragmented and replicated. Methods and eecient algorithms are provided to determine the sets of relations that can be hash partitioned , the copies of the relations to be partitioned and the partition sites, how the relations are to be partitioned and where...

متن کامل

A Heuristic Algorithm for Partition

This paper describes an improvement on a Partition and Replicate Strategy (PRS) for distributed query processing in a multi-database environment in which relations are unfragmented and replicated. For a given set of relations to be partitioned, determining the optimal copies of the relations to be partitioned is NP-hard. A heuristic algorithm is proposed. Our experimental results show that the ...

متن کامل

OLAP Query Processing for Partitioned Data Warehouses

On-line analytical processing (OLAP) queries can take hours or even days to execute on very large data warehouses. Therefore, there is a need to employ techniques that can facilitate efficient execution of these queries. Data partitioning concept that has been studied in the context of relational databases aims to reduce query execution time and facilitate the parallel execution of queries. In ...

متن کامل

Performance of Catalog Management Schemes for Running Access in a Locally Distributed Database System

Catalog management schemes affect many aspects of distributed database systems such as site autonomy, query optimization, view management, authorization mechanism, and data distribution transparency. However, the performance comparison of various catalog management schemes has received relatively little attention. Embedded read queries to the catalogs in a form of data manipulation statements a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011